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[57] ABSTRACT 

Noise in a speech-plus-noise input signal is suppressed 
by splitting the input signal into spectral channels and 
decreasing the gain in the each channel which has a low 
signal-to-noise ratio (SNR). A voice operated switch 
(VOX) acts to detect noise-only input to gate a back- 
ground noise (input signal) estimator and also to gate a 
residual noise (output signal) estimator. The gain in 
each of the channels is controlled by the current value 
(a posteriori) input signal SNR estimate, modified by 
the prior value (a priori) input signal SNR estimate, and 
smoothed as a function of the residual (output noise 
signal) estimate. 
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NOISE REDUCTION SYSTEM 

This is a Continuation of application Ser. No. 
07/150,762, filed Feb. 1, 1988, now abandoned. 5 

FIELD OF THE INVENTION 

This invention relates generally to acoustic noise' 
suppression systems and more particularly to an im- 
proved digital processing method for detecting and 10 
screening noise from speech in real time. 

BACKGROUND OF THE INVENTION 
Description of the Prior Art 

Acoustic noise suppression systems generally serve 15 
the purpose of improving overall quality of the desired 
signal by distinguishing the signal from the ambient 
background noise. 

Earlier noise suppression systems have used spectral 
substraction techniques and gain modification tech- 20 
niques in an effort to optimize noise suppression. In 
those approaches, the audio input signal is divided into 
spectral bands by a bank of bandpass filters, and particu- 
lar spectral bands are attenuated using gain estimators 
to reduce their noise energy content 25 

In most prior an techniques, in order to apply the 
proper gain factor it is necessary to estimate the energy 
content of the current background noise present as 
accurately as possible. 

Numerous approaches have been attempted to accu- 30 
rately estimate the current noise, but have met limited 
success. For example, earlier data processing systems 
appear to have generally used feed forward systems. 
Those systems have been limited in the accuracy of 
their noise estimates because they have relied primarily 35 
on the energy in current (present-time) signals in order 
to generate their noise estimates. 

Later digital signal processing systems have adopted 
more sophisticated estimating techniques. For example, 
a system which utilizes a minimum. mean-square error 40 
short time spectral amplitude estimator is discussed by 
Ephraim and Malah. That approach results in a signifi- 
cant reduction in noise and provides enhanced speech 
with colorless noise. Subsequent work along these lines 
has produced an error estimation technique that mini- 45 
mizes the mean-square error of the long-spectra. 

Those estimators have been found to lower the resid- 
ual noise level without further affecting the speech 
itself. However, those estimation techniques in and of 
themselves have been unable to remove colorless back- 50 
ground noise. Moreover, those estimating techniques 
are essentially mathematical, and the way they are im- 
plemented critically affects their effectiveness within a 
total noise reduction system. Further, those approaches 
do not appear to rely on previously processed results 55 
but essentially rely on current noisy speech signals. 

Systems that have used previously processed signal 
information have generally been unsophisticated and 
have avoided sophisticated processing techniques. One 
such system, taught, by Borth, in U.S. Pat. No. 60 
4,628,529, uses the occurrence of minima in the post- 
processed signal energy in order to control the time at 
which the background noise measurement is estimated. 
Specifically, Borth discloses a recursive, filter which 
uses the time averaged value of each speech energy 65 
estimate for making a speech/noise decision in perform- 
ing the background noise estimation. However, the 
Borth invention was designed to operate in a high noise 



background and was not adapted for implementation 
using sophisticated digital signal processing. 

In addition, Borth and the other prior art systems 
have generally focused on accurately estimating either 
the gain factor or the signal to noise ratio (SNR) of the 
background noise estimator alone and have not used 
previously computed estimators or prior instantaneous 
speech signals at every estimator stage. 

Thus, what is needed is a noise reduction system that 
is useful for high speed digital signal processing and 
which can cope with time varying noise and various 
types of noise, including colored noise and white noise, 
by efficiently using all available noise and speech infor- 
mation. Moreover, what is also needed is a noise reduc- 
tion system that shows excellent performance over a 
wide range of signal to noise ratios and is not limited to 
high background noise applications. What is also 
needed is a noise reduction system that affords algo- 
rithms for deriving more accurate estimators using pre- 
vious as well as current data. Further, what is desired is 
a noise reduction system that simultaneously optimizes 
every estimation step, including the signal to noise ratio, 
the gain, and the amplitude estimation. 

SUMMARY OF THE INVENTION 

According to the invention, in a noise suppression 
system for use with speech, a method for processing 
noisy speech-containing signals by digital signal pro- 
cessing means in which time-domain speech signals are 
converted to segments containing time-invariant spec- 
tral components, instantaneous signal-to-noise ratio 
information is calculated and a gain value for each com- 
ponent is obtained with the signal-to-noise ratio infor- 
mation based On prior information and whether the 
segment is determined to be likely to contain speech. 
The gain value is employed in an amplitude estimate for 
each component of the segment, and the components 
are reconverted into a time-domain signal. The instanta- 
neous signal to noise ratio information is calculated by 
alternative methods, including recursive algorithms. 

Initially, the incoming speech/noise signal is seg- 
mented into frequency bins or frames. An instantaneous 
signal to noise ratio for each frame is computed from an 
estimate of the log-spectral amplitude. According to the 
invention, the signal to noise ratio for each frequency 
bin is derived from exponentially averaging the power 
level so as to declare the instantaneous power level the 
noise power level. The signal to noise ratio becomes the 
ratio of the instantaneous power level to the averaged 
noise level. Gain is enhanced at low signal to noise 
ratios. High/low extremes generated in the residual 
noise removal process are minimized to suppress distor- 
tion and atonal noise. 

The invention uses adaptive noise estimators which 
are generated by employing alternative algorithms de- 
pending on current and previous noise and speech esti- 
mates for each frame. In several embodiments, recur- 
sive algorithms which use stored signals and estimators 
are employed. In one embodiment, a current noise- 
speech decision determines the algorithm used to calcu- 
late background noise estimators for current frames. 

In one embodiment, the invention compares current 
speech estimators to stored estimators to permit 
smoothing of the speech estimator. In another embodi- 
ment, the invention uses, a speech-no speech, decision 
and adaptive estimation to permit speech smoothing. 
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The invention may best be understood by reference 
to the following description when taken in conjunction 
with the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 5 

FIG. 1 is a block diagram of a digital processing 
system for noise reduction, including a noise reduction 
system. 

FIG. 2A is a block diagram of a prior art digital pro- 
cessing system using a mean square estimating tech- 10 
nique in its noise reduction system. 

FIG. 2B is a block diagram of another prior art sys- 
tem employing limited post processing feedback to ■ 
enhance noise reduction. 

FIGS. 2C and 2D are generalized block diagrams of 15 
differing embodiments of the invention. 

FIG. 3 is a block diagram of the preprocessing sub- 
system for a digital signal processing system in accor- 
dance with the invention. 

FIG. 4 is a detailed block diagram of another embodi- 
ment of a noise reduction system in accordance with the 
invention. 

FIG. S is a block diagram of a post-processing subsys- 
tem for a digital processing system in accordance with 
the invention. 

FIG. 6A is a logic flow diagram showing digital 
processing steps in accordance with the invention. 

FIG. 6B is a continuation of the logic flow diagram at 
FIG. 6A showing digital processing steps in accordance 3Q 
with the invention. 

FIG. 7 is a logic flow diagram illustrating the steps 
for calculating the spectral amplitude estimator, A*{n), 
in accordance with the invention. 

FIG. 8 is a logic flow diagram illustrating the steps 35 
for calculating the residual noise estimator, RPSD^n), 
in accordance with the invention. 

FIG. 9 is a blocked diagram showing the steps for 
calculating the background noise estimator, B*(n), in 
accordance with the invention. 40 

FIG. 10 is a logic flow diagram which sets forth the 
steps for calculating the a posteriori signal to noise ratio, 
ST*(n), in accordance with the invention. 

FIG. 11 is a logic flow diagram which sets forth the 
steps for calculating the a priori signal to noise ratio, 45 
SI*(n), in accordance with the invention. 

FIG. 12 is a depiction of a gain table in accordance 
with the invention. 

FIG. 13 is a logic flow diagram which sets forth the 
steps for calculating gain limiting in accordance with 50 
the invention. 

FIG. 14 is a logic flow diagram which sets forth the 
steps for calculating spectral smoothing of the current 
amplitude speech estimator. 

DESCRIPTION OF THE PREFERRED 55 
EMBODIMENT 

The invention is a real-time system which detects and 
selectively screens noise in the present of speech using 
adaptive estimation techniques. Adaptive estimation as 60 
used herein includes selecting between alternative algo- 
rithms to calculate a current estimator for a frequency 
bin. The decision for determining which algorithm to 
use to calculate the adaptive estimator is also based on 
current and stored noise and speech criteria. Typically, 65 
one algorithm is recursive while the other sets the esti- 
mator at a constant value depending on current and 
stored noise and speech criteria. 



The invention thus provides virtually noise-free 
speech in a large variety of wide-band audio applica- 
tions. The invention greatly improves speech percep- 
tion and reduces operator fatigue wherever noise inter- 
feres with communications. 

The invention as described herein uses digital signal 
processing methods and algorithms to discriminate be- 
tween noise and speech throughout the audio spectrum. 
As will become apparent hereafter, the invention is 
highly adaptive and deals efficiently with many differ- 
ent noise environments. In particular, the invention 
copes with noises that vary rapidly and deals efficiently 
with different types of noise, including white noise and 
colored noise. The invention also provides an improve- 
ment in the signal to noise ratio by more than 10 do for 
input SNR of 15 db or less. 

Inasmuch as the noise reduction system described 
herein is used interactively with other portions of a 
digital signal processing system, the overall digital sig- 
nal processing system in accordance with the invention 
will be described before discussing the features of the 
noise reduction system. Refer now to the block diagram 
for FIG. 1. FIG. 1 shows a generalized digital process- 
ing system 8 in accordance with the invention, includ- 
ing a voice activated switch 60 and noise reduction 
system 50. As shown in FIG. 1, a noisy speech signal 
X(n) is initially received by an automatic gain control 
(AGC) stagfe 10. Input signal X(n) is a continuous time 
varying signal that over time contains both speech and 
noise. The AGC stage 10 provides approximately 50 db 
of dynamic range. The AGC stage 10 uses an array of 
attenuators controlled by AGC parameters provided by 
a preprocessing stage 30 in a feedback relationship with 
the AGC 10. The output of AGC stage 10 is fed to a 
converter (ADC) 20 which converts the signal from 
analog to digital form. The ADC 20 may be a linear 
twelve-bit analog to digital converter or a codec having 
a sampling rate of 8,000 samples per second. A linear 
ADC stage must be preceded by an anti-aliasing filter 
while most codecs have such a filter built in. The digital 
output of ADC stage 20 is forwarded to a voice acti- 
vated switch 60 (VOX) and to a preprocessing stage 
(preprocessor) 30. As illustrated also in FIGS. 2C, 3 and 
4, the output of the VOX 60, which provides a binary 
Speech/No Speech decision, is coupled to the prepro- 
cessor 30 and to a noise reducing stage (noise reducer) 
50. 

Referring still to FIG. 1, the preprocessor 30 seg- 
ments the digitized signal into overlapping frames. Each 
frame is pre-emphasized and weighted in the prepro- 
cessing stage 30 by an appropriate window for subse- 
quent frequency transformation. During preprocessing, 
AGC control parameters are also computed, depending 
on the energy content of each frame. 

Referring now to FIG. 3, there is shown a block 
diagram of the preprocessing stages of a preprocessor 
30 used in the system according to the invention. As is 
generally appreciated, because of the non-stationary 
nature of speech itself, the initial speech signal X(n) 
must be segmented into segments or frames by prepro- 
cessor 30 so that the stationary nature of the speech can 
be assumed. Thus, shown in FIG. 3 is a windowing 
stage 31. In windowing stage 31, frames of 128 samples 
of 16 milliseconds per frame are formed from the digital 
signal with 50% overlap. Each frame is weighted by an 
appropriate window for two reasons: to avoid spectral 
leakage and to permit continuous processing of input 
speech. In various embodiments of the invention, a 



04/06/2004, EAST Version: 1.4.1 



5,012, 

5 

Hanning window is used, because when added to itself 
with delay of one half the window duration, it sums to 
unity. This property of the Hanning window fits the 
requirements of the "overlap add" method used in steps 
hereafter described. As further shown in FIG. 3, auto- 5 
matic gain control parameters are also generated at an 
AGC processor 32 and are used to adaptively estimate 
the peak energy of integrals classified as speech by the 
VOX 60 (FIG. 1). AGC processor 32 also sends a signal 
to the AGC stage 10 to control each attenuator accord- io 
ing to its corresponding AGC parameter. The attenua- 
tor values are such that no switching side effects are 
heard at the digital processing system output. The dy- 
namic range of the system is up to 50 db. Finally, in 
preprocessing stage 30, a pre-emphasis can be intro- 15 
duced without affecting intelligibility because the first 
format is less important perceptually than the second 
one. Pre-emphasis is performed on each frame accord- 
ing to the following recursive formula: 

20 

X{n)=Yin)-a >Y(n-\) 

where 

Y(n— .1)= previous input sample for the current 
frame; 25 

Y(n)= current sample; 

X(n)=pre-emphasized sample; and 

a=a pre-emphasis coefficient. 

Returning now to FIG. 1, it is seen that the frames 
X(n), output from preprocessing stage 30 are coupled to 
the fast Fourier transform (FFT) stage 40. In FFT stage 
40, a short time Fourier analysis is performed on each 
frame. Each time frame of the noisy speech is converted 
into the frequency domain using a fast Fourier trans- 
form algorithm. As further shown in FIG. 1, frames of 
noisy speech that have been converted into the fre- 
quency domain with spectral components Y* are cou- 
pled from FFT stage 40 to a noise reduction stage (noise 
reducer) 50. The noise reducer SO includes noise reduc- 
tion features to be discussed in detail hereinafter. The 
noise reducer stage 50 operates to provide at its output 40 
an enhanced speech signal with enhanced spectral com- 
ponents Xa having very low background and residual 
noise content. Noise reducer 50 takes advantage of the 
major importance of the short time spectral amplitude 
of the speech signal and its perception, and utilizes a 45 
mean square estimator for enhancing the noisy speech. 
The noise reducer 50 is also responsive to VOX switch 
60 as an indicator of the presence or absence of speech 
and uses previously stored signals as will be described in 
greater detail hereafter. ?° 

The VOX switch 60 is used to provide a reliable 
speech/no-speech (Y/N) decision, given an input signal 
even under severe noise conditions. This speech deci- 
sion is used during the estimating stages for the noise 
reducer 50. One example of a VOX switch which may 55 
be used is "disclosed in the pending Israeli patent appli- 
cation Ser. No, 84902 filed Dec. 21, 1987 corresponding 
to U.S. application entitled "Voice Operated Switch", 
Ser. No. 151,740 filed Feb. 3, 1988, now U.S. Pat. No. 
4,959,865 issued Sept. 25, 1990 [Disclosure 11685-4] or 60 
in the commercial product SMARTVOX available at 
the time of the filing of the parent application from The 
. DSP Group, Inc. of Emeryville, Calif. The VOX 60 is 
useful for eliminating unnecessary computation on non- 
speech (i.e., background noise) segments. As such other 65 
suitable switches can be used for this purpose. The 
voice operated switch in the above-referenced disclo-. 
sure examines a segment of input signal to determine if 



it has periodic or harmonic content, which is an indica- 
tion of the presence of a voiced phoneme and thus the 
presence of speech. Other VOX devices which might 
be used are energy threshold detectors, as are common 
in the art of analog signaling. If the VOX 60 is an analog 
signal device instead of a digital device, the VOX input 
may be derived from the analog output of the AGC 10. 
The input to the VOX 60 is merely shown as a represen- 
tation of one possible implementation. 

Referring still to FIG. 1, shown coupled to the output 
of noise reducer 50 is an inverse fast Fourier transform 
(IFFT) stage 70. In this stage, the enhanced spectral 
components are transformed back to the time domain in 
order to reconstruct the signal. The IFFT stage 70 uses 
an inverse fast Fourier transform algorithm to convert 
frequency domain frames back into the time domain. 
Output frames from the IFFT stage 70 are fed to a 
post-processing stage 80. The post-processing stage 80 
reconstructs the enhanced frames using the weighted 
overlap add method and de-emphasis in order to restore 
natural speech spectral rolloff in accordance with con- 
ventional teachings. An output AGC stage 90 is cou- 
pled to the output of the post -processing stage 80 for 
controlling the level of the digital signal input to an 
output DAC 100. The output of the output DAC 100 is 
the audible enhanced speech having reduced back- 
ground and residual noise levels. 

Having thus described the overall digital processing 
system in accordance with the invention, the noise sup- 
pression system of the invention will now be described, 
first by reference to the prior art techniques and then by 
describing the features and methods used in operation of 
the invention. 

Refer now to prior art noise suppression systems in 
FIGS. 2A and 2B. FIG. 2A depicts a system as taught 
by Ephraim and Malah which used the minimum mean 
square log estimators. The system shown in FIG, 2A is 
a feed-forward system and does not fully eliminate noise 
components. As taught by Ephraim and Malah. the 
system does not disclose or suggest calculation of resid- 
ual noise estimators or any gain limiting or smoothing 
techniques nor does the system use recursive algorithms 
to learn the background noise. 

FIG. 2B shows a noise suppression system as taught 
by Borth. The system disclosed in FIG. 2B uses post- 
processed signals in making the speech noise decision. 
However, this system specifically relies on detecting 
valleys in post-processed signals and thus is most useful 
for high noise applications. In addition, the system is 
intentionally simple and is not intended for sophisti- 
cated data processing applications. 

Refer now to FIGS. 2C, 2D and 4 which set forth in 
block diagram form various embodiments of the noise 
reduction system in accordance with the invention. It 
should be noted at the outset that one of the features of 
the invention which permits greater noise reduction is 
the manner in which the invention recursively uses 
stored signals to generate a plurality of estimators. It is 
also noted that the invention uses residual noise estima- 
tors as well as background noise estimators to generate 
other estimators. In addition, the invention uses voice 
activated decisions to generate the residual and back- 
ground noise estimators. Further, the noise reduction 
system of the invention uses a minimum mean square 
error log spectral amplitude estimator technique, which 
exploits the notion that principally the short time spec- 
tral amplitude rather than phase is important for speech 
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intelligibility. Although the invention uses a minimum the invention, these values are adaptively determined, 

mean square error log spectral amplitude estimator stored, and recursively used to generate noise free 

mathematically similar to that taught by Ephraim, the speech. 

estimator is applied in a manner and method not hereto- Refer now to FIGS. 2C and 2D which depict block 

fore disclosed. 5 diagrams of noise reduction systems in accordance with 

FIG. 4 in particular depicts a specific embodiment of differing embodiments of the invention. Referring first 

a noise reducer 50 in accordance with the invention. In to FIG. 2C t there is shown in a noise reduction system 

the following discussion, 4, k" denotes the spectral com- . so a rectangular to polar converter stage 12 for separat- 

ponent and "n" denotes the frame at time T=n. It must i ng eacn spectral component of an input frame X*(r>) 

be understood that the noise reducer 50 operates in the to j n to amplitude and phase information, 

frequency domain so that all processing is done on spec- Noisy amplitude information R*(n) for each frame is 

tral components of time-invariant samples of a frame. In f cd f rora rectangular to polar (RP) converter 12 to 

a specific embodiment, each segment of 128 samples amplitude estimator 13 and to signal to noise ratio SNR 

which characterize a frame of the noisy speech signal is estimator 15. RP converter 12 is operative to separate 

converted by means of the fast Fourier transform pro- 15 t jj C spe ctral amplitude components R* from the phase 

cessor FFT 40 into 64 spectral components in the fre- component d ak to permit processing of the spectral 

quency domain Yi through Ym- A parameter "(n)" components. SNR estimator 15 is responsive to inputs 

indicates the *V A " frame. Labels in FIG. 4 correlate from VO x switch 60. and to a memory 17. The output 

with the following mathematical description. of SNR est i mat or 15 is fed to gain estimator 16. Gain 

For the noise reduction systems of FIGS. 2C f 2D and 20 est i mator j$ is also responsive to inputs from VOX 
4, the problem of formulating the correct speech estima- switch 60 and memory 17. The output Gjt(n) of gain 
tor, i.e. the amplitude estimate A*, is the problem of est i mat or 16 is coupled to amplitude estimator 13 which 
estimating the amplitude of each Fourier expansion . g aUo fcd lhe oulput R/ .( n ) G f rp converter 12. The 
coefficient of the speech signal given the noisy signal. In outpyt A ^ n) Qf amplitude estimator 13, i.e. the noise 
the minimum mean square log method, the Fourier 25 suppressed signa i ( ^ tne product of G*(n).Rjt(n) and is 
expansion coefficient of the speech signal as well as of fcd through smoot her 14 to polar rectangular converter 
the noisy signal are modelled as statistically mdepen- ^ tQ mem0fy 17 Mem0 ry 17 provides stored in- 
dent Gaussian random variables. Mathematically, the stantaneous va i ues of A*(n), G*<n). and SNR signals to 
analysis can be expressed as follows: SNR estimator 15, to gain estimator 16 for generating 

Let X* denote the kth Fourier expansion coefficient 30 estimators and gain estimators Grfn). Memory 17 

of the speech signal and let Y*denote the noisy observa- ^ ^ vaIues tQ smoother 14 PoIar t0 

tions in the internal 0 (zero) to T. Further let rectangular converter 18 combines the estimated arnpli- 

x -a lude A *00 w it n tne noisv phase as the first step in the 

*~ ** 35 signal reconstruction process in accordance with con- 

and ventional teachings. P to R convener 18 is the final 

Y k =Rk'J ak sta S e m tne noisc suppression stage 50 as shown in FIG. 

2C 

Then A* may be defined as the estimate which mini- Refer now to FIG. 2D. FIG. 2D is a block diagram of 

mizes the following distortion measure: ^ another embodiment of the invention. The embodiment 

in FIG. 2D is similar to the embodiment in FIG. 2C; 

jL-flOog^t-log^/r) 2 ! however, additional features are shown in FIG. 2D. In 

particular, residual noise estimator 11 is included in the 

It can be shown that this amplitude estimator is given feedback path for noise suppressed signals, and the out- 

by Ajt=exp {E[(ln Aa/Y*)]} 4J pu t of- residual noise estimator 11 is used in generating 

Using the assumed statistical model, it can be further gain est j mators \ n ga j n estimator 16. Residual noise 

shown that the desired amplitude estimator A*(n) is estimator 11 is responsive to a speech/no-speech (Y/N) 

obtained from R*(n), the noisy signal, by a multiplica- dec i s i on f rom VOX switch 60. Also shown in FIG. 2D 

-tive, non-linear gain function which depends only on - s a b ac kg round noise estimator 19 included in the feed 

the a priori and the a posteriori signal to noise ratios, jq f orward pat h to SNR estimator 15. Background noise 

SI*(n) and STjtfn), respectively. This gain function is estimator 19 [ s a l so responsive to a speech/no-speech 

defined by: decision from VOX switch 60. The output, Bjt(n), of 

background estimator 19 feeds SNR estimator 15 which 

_ Ak ^ i s a fc° by spectral power stage 9 and memory 17. 

din) = CiShini sn<n)) - Rk<n) ^ Refef now to FIG ^ fl more detailed embodiment of 

the invention. Referring to FIG. 4, it can be seen that 
or the SNRs are determined based in part on the output of 

adaptive background noise estimator 19. The back- 
Ai£n)=G(SlL<n) t sn{n))-Rk(n) ground noise estimator 19 is in turn controlled by deci- 

60 sions from the VOX switch 60. The VOX switch 60 in 
where n denotes the interval of time, and K the spectral tum c j ass jf ies spee ch segments as speech or non-speech, 
component under consideration. . Segments classified as no speech are processed by an 

Thus, as is apparent from the above mathematical adapt i V e algorithm acting on the power of each spectral 
formula, A*, the proper amplitude estimator, is deter- comp0 nent to generate adaptive background noise esti- 
mined by multiplying G*, the proper gain estimator, 6J mators Through use of the VOX decision, the system is 
times Rjt, the given noisy observed speech signal. Thus, ablg tQ process frames with the knowledge that speech 
to determine A*, G* must be determined. In order to of no gch fe bd processed at any one instant. In 
determine G*. first the a prion SNR. SI*, and the a fc background estimator B;(n) can be up- 

posteriori SNR, ST*, must be determined. According to " 
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dated each time a non-speech decision is made by the 

VOX. Smn)=Rh<a)/Bi<n). 

Referring still to FIG. 4, it is seen that background 

noise estimator 19 is fed from spectral power calcula- Therefore, replacing terms, the foregoing equation for 

tion block 9 which provides the spectral power R^(n) 5 the a prion SNR, SI*(n), becomes: 

of the noisy observation R*(n). sitiri=\A l tin-\VBan-\\\aU 

Background noise estimator 19 also is fed a speech- l- j^sr^/O-i]. 
/no speech (Y/N) signal from VOX switch 60. .Given 

the speech/no-speech decision and spectral power in- Use of the past value of the gain estimate and the past 

put, background noise estimator 19 calculates the back- 10 vaIue of lhe fl posteriori SNR ^ exp i ain ed hereinafter, 

ground noise estimator B*<n) according to the follow- fe equivalent t0 ^ of the past amplitude estimate and 

ing adaptive algorithm: the background noise estimate, as explained herein- 

If speech, then above. A stored iteration (e.g.. memory block of ele- 



Bkbi)=Bk(n-l) 15 
i.e. no updating is performed. 



ment 59) holding the previous values as noted is cou- 
pled in feedback relation to a priori SNR estimator 
element 52, indicating the recursive nature of the pro- 
cess. 



If no speech, then Referring still to FIG. 4, once the a priori signal to 

BKi»)=(\ -a)B£n- \)+aN k {n) 20 n ° iS<i rati0 a " d tKe & P 0Steri0fi si S nal t0 n ° ise r3ti ° S afe 

calculated, the results are used to determine a gain esti- 
where a=a constant, and N*(n)=R*(n), a being set to ™tor GA<n) from a gain table 58 according to conven- 
0.1 in one embodiment. This adaptive algorithm is per- tl0nal teachings. . , . 

formed by the adaptive noise estimator 19. In severe noise conditions, background musical noise 

The output of adaptive (background) noise estimator 2 5 wil1 a PP ear for some P rior art systems. In order to over- 
19 is thereafter fed to a posteriori estimator 53 and a come this problem, gain limiter 55 is introduced to fur- 
priori estimator 52. Thus, it can be seen that any varia- <her modify the gain estimate G*(n) to G*'(n). The 
tion in the background noise is rapidly detected and effect of limiter 55 is to create a spectral floor which 
used to update the background noise estimator which is masks musical noise. This approach is based on the fact 
used in the SNR estimator. 30 that broadband noise is more pleasant to a hearer than 

The a posteriori SNR is computed by the a posteriori narrow band noise. The limiting threshold may be con- 
signal-to-noise ratio (SNR) estimator element 53 (see trollable from an external source 56 (not shown) The 
also FIG. 10) according to the following formula: gain limiting algorithm limits the lower bound of the 

gain to a preset value, allowing the operator to change 
3 5 the spectral floor according to environment noise con- 
sn<n) = RkHn) ditions. 

BM The limited gain estimate Gjt'(n) * s then fed to ampli- 

tude estimator 59. In amplitude estimator 59, the noisy 
wherein R*(n) is the current observed noisy spectral signa! Rj ,.( n ) is multiplied times the modified gain esti- 
amplitude for the kth spectral component and B*(n) is ^ mate Gjt - (n) to generat e a noise suppressed signal Akin). 
the noise estimator for the current spectral component. Xhe purpose 0 f smoother stage 57 is to eliminate 
Given the background noise estimator and the a post- residual noise components observed as isolated peaks by 
eriori estimator ST*(n), the a priori SNR, SI*(n), can be using a non -linear smoothing algorithm based on resid- 
determined at a priori estimator 52 using a decision ua j noise est jmates and stored signals. It implements the 
directed method. 45 algorithm depicted in FIG. 14. The residual noise esti- 

The proposed estimator for the a priori SNR is a mator u per f 0 rms adaptive estimation based on VOX 
decision directed estimator because the SNR is updated decisions. It implements the algorithm depicted in con- 
on the basis of a previous amplitude estimate. The a nection with' FIG. 8. The residual noise estimator 11 
priori SNR is calculated by the a priori SNR estimator uses a dual time con stant scheme based upon adjacent 
element 52 recursively using the following formula: 5Q prior estimates and reduces spectral peaks due to ran- 
dom variations in residual noise. 
5/ ^ fl) TircT / T l m n<n " !))a+<( ' The residual noise estimator is used as a threshold for 

activating the non-linear smoother 57. 
where P(k)=X if x>o, and O otherwise. From the Referring again to non-linear smoother 57 in FIG. 4, 
foregoing equation, it can be seen that the a priori SNR 55 the smoother 57 modifies the output of amplitude est.- 
is calculated using the prior values of the gain estimate mator 59 usln 8 a non-linear smoothing algorithm based 
Gjt(n-1) and the prior and current value of the posteri- ° n in P uts from a memory which is a storage circular 
ori SNR, ST*. The "a" is a weighting factor and has a buffer 17. This buffer 17 stores L previous squared 
value in one embodiment between 0.9 and 0.95. values of each P rior s P« tral estimate A*(n-1), 

As a further explanation of the foregoing, and in 60 A A {n-2) . . . A*(n-L). The smoother 57 is activated 
order to make it clear that the a priori estimator element selectively depending on whether the residual noise 
52 employs a past amplitude estimate, consider the fol- estimate exceeds a predetermined threshold THR. The 
lowing: From the above discussion of the derivation of smoothed amplitude estimate element 13 receives the 
the proper amplitude estimator it is known that: smoothed power spectral estimate and computes its 

65 square root to obtain the final smoothed spectral ampli- 

AiAn^GiAnWtAn) tuc * e estimate. 

Afterwards, the. final smoothed spectral amplitude 
and that:- estimate is combined with the noisy phase at PR con- 
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vertcr 52 as the first step in signal reconstruction by The proposed AGC algorithm gives the system -im- 
converting the spectral amplitude and phase informa- munity against energy envelope distortions, thus pre- 
tion in polar notation into real and imaginary compo- serving the original energy envelope of the clean 
nents in rectangular notation. speech. Otherwise, the intelligibility of the enhanced 

Refer now to FIG. 5, which describes the post-proc- 5 speech may be degraded, 
essing step. The enhanced spectral components are time The foregoing description has provided a functional 
Fourier transformed 70 and the signal is reconstructed description of the noise reduction system according to 
using the weighted overlap and add method 81. the invention, including various embodiments thereof. 

The de-emphasis step 82 restores the natural speech The following discussion will describe the operation of 
spectrum roll-off using the following recursive (time 10 various processes and methods mentioned above at 
domain) equation acting on the reconstructed samples: various stages of the invention using flow diagrams as 

illustrations. 

*(*)= Wfn)+6^n-i) Re f er now lo F1G s. 6A and 6B. A flow chart illus- 

trating the overall operation of the entire digital pro- 
where 15 cessing system as shown in FIG. 1 is given in FIG. 6A 

W(n) = Reconstructed sample and commues to FIG. 6B. Functional blocks 511, 513, 

X(n)=De-emphasized sample 514 and S16 of FIGS 6A and 6B are described in more 

X(n- 1) = Previous de-emphasized sample detai , in FIGS 7f 8 9 and 14 respectively. 

b = De-emphasis coefficient Referring now to FIG. 6A, the operation of the sys- 

The above variables X, Y and W depict recursive 20 |em begins at the starting block 50 i which corresponds 
equations of the pre-emphasis and de-emphasis steps in tQ the prc .p roce ssing stage 30 in FIG. 1. Block 501 
the time domain, relating consecutive samples within a repreS ents the powering up of the system and the initial- 
frame, and are not related to the spectral components ization of thg bu ff ers / memories and counters. The in- 
defined above. coming signal is digitized by ADC 20 at a sampling rate 
The goal of the output AGG 90 is to restore the ongi- 25 Qf ^ samp]es second Each samp , e . $ stored in ft 

nal speech energy envelope. The amplitude estimate WQrki bufTer at SQ2 and pre . cmphasi2ed in step 

algorithm assumes the frequency components to be 5()4 In tioil| the invenl ion performs signal analysis 

statistical independent random variables. This fact can Qn frames of 12g , es corresponding t0 16 mi |, isec . 

affect the overall energy of the clean speech. In order to Qnds ffame Frames Qverl b ^ % whefeb each 

preserve the original energy envelope of the signal, the 30 ^ {% constructed b usi 64 new , es and by 

followmg AGC algorithm is applied: u$i ^ , w , Qf the ious frame Count x 

When the VOX detects a speech frame, the energy in / lQ fiA - s ft ^ coun ^ nSQd t0 check if a new 

before and after noise cancelling and the total back- b)ock f 64 [e$ have been recejved anrf are fead [Q 

ground noise estimate are computed respectively as be sed ^ hen count j ls 64> a new anal is 

follows: 35 frame is formed. 

Next in FIG. 6A ( the AGC control parameters are 
Ettn) = *z RkHn) computed as a function of slow varying trends in the 

*-i signal's energy using an exponential averager with a 

w 4 0 long time constant that is updated with the energy con- 

Efl(n) » I AkHn) tent of voiced frames as they are detected by the VOX. 

■ *" 1 When the average value reaches a predetermined 

64 threshold, the AGC parameters are changed in order to 

E.\<n) - ^ i Bkhn) tc eep the signal between optimal sample levels. Steps 

45 501 through 508 are performed primarily by preproces- 

An estimation of the speech energy is made by sub- so ^ 3 ?, of T IG ' , . 
stracting the total background noise estimate from the Following completion of preprocessing step 508 a 

total energy before noise cancelling: * short time f ou "I r JT Sf0rm 15 P erformed usin * a 64 

point complex FET algonthm. Next, a rectangular to 

Ejin)-Ettn)-E\in) 50 polar conversion is used to calculate the noisy spectral 

amplitude Rjt(n) and the frame is now ready for the 

Then the output AGC gain is evaluated as follows: amplitude estimation step described in FIG. 7 below. 

Referring now to FIG. 6B, steps are shown which 

£ . n) indicate the interactive operation of the VOX switch 

GAGCi") = £( ^ rt j * 55 w ith the noise reduction system of the invention after 

completion of the amplitude estimation step. As shown 

and each frame "n" is multiplied by its corresponding in FIG- 6B, initially, the VOX switch decides whether 

G^cdn) gain before being converted in the DAC step. a noisy frame contains speech or no-speech. When the 

When the VOX detects a "non-speech" frame, an y OX detects a speechless frame, two actions take place. 

exponentially averaged value of the last Gagc'is used as 60 First the noise background estimate is determined 

the gain factor for the first 2 seconds of non-speech recursively as shown in FIG. 9. Secondly, the residual 

frames. After 2 seconds of VOX detected "non-speech" noise estimate is updated using a fast attack, slow decay 

frames, the gain is updated using the following recur- scheme, as more fully described in FIG. 8 hereafter. 

s i on; The corresponding spectral power A*(n) of the en- 

65 hanced components is stored in a circular buffer (mem- 

GAGdn)= $'G A Gdn-\) ory) which, in the preferred embodiment, contains the 

last five squared values of Ajt, i.e. A*(n— I). - • - 

whereO<0<l A*(n-5). 
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After the smoothing step 516 eliminates randomly 
distributed peaks in the spectrum, the resulting spectral 
estimate is combined with the noisy phase as shown in 
block 517. 

The enhanced complex spectral components are then 5 
time transformed by an inverse FFT method. The re- 
sulting frame is weighted and added with 50% overlap 
to the previous frame, leading to the reconstructed 
signal 519. Next, the digitized samples are converted to 
analog form by the digital to analog converter 520, at io 
which time processing for a frame is completed. The 
frame counter, count 2, is incremented, , the sample 
counter, count 1, is zeroed, and the processing of a new 
frame begins. 

Because of the real time characteristics of the system, 15 
the acquisition of new samples in the processing, of 
frames in accordance with FIGS. 6A and 6B are not 
serial but are parallel processes. Calculations are in 
progress for an old sample while a new sample is being 
acquired. Control signals insure that processing pro- 20 
ceeds in an orderly fashion. 

Refer now to FIG. 7 which illustrates the steps in the 
spectral amplitude estimation calculation step 515. As 
shown in FIG. 7, from the FFT are obtained 64 spectral 
samples per frame. For each frame, the following steps 25 
are performed. First, the background noise estimate 
Bjt(n) is calculated according to the steps in FIG. 9. 
Next, the a posteriori signal to noise ratio in calculated 
using the noisy observation. A flow chart depicting the 
a posteriori calculation steps is shown at FIG. 10. 30 

Next, the a priori signal to noise ratio is calculated 
using the decision directed approach. FIG. 11 depicts 
the steps for computing the a priori signal to noise ratio. 

Next, the gain is computed, using the lookup table in 
reliance on the a priori and the a posteriori computed 35 
estimates. A gain table according to one embodiment of 
the invention is shown at FIG. 12. Next, an enhanced 
spectral amplitude estimator Ajt(n) is obtained by multi- 
plying the noisy spectral amplitude R*(n) by the gain 
estimator GA<n). 40 

Refer now to FIG. 8. FIG. 8 describes the steps for 
calculating the residual noise estimator. In FIG. 8, a 
VOX detects a speechless frame and determines the 
characteristics of the residual noise. In FIG. 8, Njt(n) 
represents the estimated power of the kth spectral com- 45 
ponent of a noise frame 



(i.e. NM = aH»)) ' 

k 



50 



As shown in. FIG. 8, once N*(n) is calculated, resid- 
ual estimator RPSDjt(n) is adaptively updated using a 
dual time constant averager. The time constant M E" is 
set to 1 at step 703 if the present component is greater 55 
than the residual estimator; otherwise, "E" is set to 0.05 
at step 704, giving the averager a fast attack, slow decay 
behavior. Once the residual noise estimate is derived for 
the kth component, a counter is reset at step 706 and 
calculation is repeated for all the 64 spectral compo- 60 
nents. The output is used in step 516 to smooth the 
power spectrum. 

Refer now to FIG. 14. FIG. 14 illustrates the spectral 
smoothing algorithm. The spectral smoother method 
uses previous spectral power estimates A*(n — 1), ... for 65 
each component. First, the value of the current estima- 
tor is compared to the value of the residual noise estima- 
tor generated previously. If the estimated spectral 



power is greater than the residual estimator, there is a 
high probability that speech is present at that frequency 
so that the smoother is not activated. If the estimated 
spectral value is lower, it is replaced by the minimum 
value Ajt(n-1), ... in the buffer which is thereafter 
used in reconstructing the signal. This mechanism elimi- 
nates strong variations between frames produced by 
noise at determined frequencies. Refer now to FIG. 2C. 
FIG. 2C is an embodiment of the invention wherein 
spectral smoothing is performed on the amplitude esti- 
mator. 

The invention has now been explained with reference 
to specific embodiments. Other embodiments, including 
realizations in hardware and realizations in other pre- 
programmed or software forms, will be apparent to 
those of ordinary skill in the art. It is therefore not 
intended that the invention be limited except as indi- 
cated by the appended claims. 

What is claimed is: 

\. A digital processing method for reducing the noise' 
in noisy speech signals, including the steps of: 
. (a) generating background noise estimates from noisy 
speech and storing said background noise esti- 
mates; 

(b) generating adaptive current noise estimates from 
current noisy speech signals and stored back- 
ground noise estimates; 

(c) generating current gain estimates from adaptive 
current noise estimates and past speech estimates; 
and 

(d) using current gain estimates and current noisy 
speech to obtain current speech estimates, 

wherein said , step of using, adaptive current noise 
estimates and past speech estimates to obtain cur- 
rent gain estimates includes the step of limiting the 
lower limit of the gain estimate to eliminate musical 
noise, and 

wherein said step of generating adaptive current 
noise estimates includes employing results of a 
speech/no speech decision from information ob- 
tained from current signal input to distinguish said 
noisy speech from background noise. 

2. A digital processing method according to claim 1 
and wherein said step of using current gain estimates 
and current noisy speech to obtain current speech esti- 
mates comprise the step of applying an automatic gain 
control algorithm to estimated speech in order to re- 
store the original energy envelope of the speech. 

3. The digital method of claims 1 or 2 and wherein 
said current noise estimates are background noise esti- 
mates. 

4. The invention of claim 1 further including the step 
of using a speech, no speech decision to select an algo- 
rithm when generating decision directed estimates. 

5. A digital processing method for reducing the noise 
in noisy speech signal, comprising the steps of: 

(a) generating amplitude estimates from noisy speech; 

(b) generating residual noise estimates from said am- 
plitude estimates by operation of a voice operated 
switch; and 

(c) generating adaptive residual noise estimates from 
said amplitude estimates when speech is not pres- 
ent; and 

(d) using said adaptive residual noise estimates for 
smoothing speech signals. 

6. A method for reducing the noise in noisy signals 
containing speech, said method comprising the steps of: 



I 
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(a) generating, from Fourier expansion coefficients of 
said noisy signals, background noise estimates, and 
storing said background noise estimates; 

(b) generating thereafter, from Fourier expansion 
coefficients of said signals and said stored back- 5 
ground noise estimates, adaptive current noise esti- 
mates; 

(c) generating thereafter, from said adaptive current 
noise estimates and past speech estimates, current 
gain estimates; and 10 

(d) producing thereafter, from said current gain esti- 
mates and current digitized noisy signals, current 
speech estimates, said current speech estimates for 
use thereafter as past speech estimates, 

wherein said step (c) includes the step of limiting the 15 
lower limit of said gain estimate to eliminate musi- 
cal noise, and 
wherein said step (b) includes applying a speech/no 
speech decision to said noisy signals containing 
speech to identify said current speech estimates 20 
with a signal segment containing speech. 
1. A method for reducing noise in noisy signals con- 
taining speech, said noisy signals being divided into time 
invariant segments, said method including the steps of: 

(a) generating, from Fourier expansion coefficients of 25 
said segments of said noisy signals, amplitude esti- 
mates; 

(b) thereafter generating, from said amplitude esti- 
mates, (i) residual noise estimates from said ampli-. 
tude estimates where speech is present in a current 30 
segment, and (ii) adaptive residual noise estimates 
where speech is not present in a current segment; 
and 

(e) smoothing said noisy, signal containing speech 
with said adaptive residual noise estimates to sup- 35 
press noise. 



8. A digital processing method for reducing the noise 
in noisy speech signals, including the steps of: 

(a) generating, from Fourier expansion coefficients of 
segments of said noisy speech signals as amplitude 
estimates; 

(b) generating background noise estimates from said 
amplitude estimates, including employing results of 
a speech/no speech decision (Y/N) from informa- 
tion obtained, from current signal input to distin- 
guish signals containing speech from background 
noise; 

(c) generating first signal-to-noise estimates from said 
background noise estimates and said amplitude 
estimates (a posteriori SNR); 

(d) generating decision directed signal-to-noise esti- 
mates recursively from said background noise esti- 
mates updated on the basis of previous speech am- 
plitude estimates (a priori SNR); 

(e) generating current gain estimates from said first 
signal-to-noise estimate and said decision directed" 
signal-to-noise estimates; and 

(e) using current gain estimates and current noisy 
speech to obtain current speech amplitude esti- 
mates. 

9. The method according to claim 8 wherein said step 
of using current estimates further includes the step of 
limiting the gain estimates to gain limited estimates to 
eliminate musical noise. 

10. The method according to claim 8 further includ- 
ing the steps of employing said current speech ampli-. 
tude estimates using current estimates and results of a 
speech/no speech decision (Y/N) from information 
obtained from current signal input to generate a thresh- 
old signal for adaptive residual noise for obtaining 

smoothed amplitude estimates. 

***** 
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